This assignment is for ETC5521 Assignment 1 by Team brolga comprising of Dhruv Nirmal and Gui Gao.
The classic Board Games have been around for decades, bringing people together to enjoy the traditional game. In Greece there are many popular board game associations and ‘fan clubs’ which organise many tournaments and offer a wealth of prizes.
Today, although Computer Games are in a golden age of development with technological support, there are still many great board games that are released each year and attract a lot of attention.
Board Game Geek is a specialist board game website. Users can find every board game and information about it. This information includes descriptions of the games, reviews, user ratings, professional ratings, prices, where to buy and more.
My teammates and I are both interested in board games and have tried many interesting board games. This study and analysis of the huge dataset of board games can help us understand board games from a different perspective, and also help us understand the whole landscape of board games and how it has changed over time.
So, we have tried to dig deeper into the data itself to show some reports and interesting data visualisations of the results.
The original data contains about 15-19 million reviews, and the data in the dataset should be filtered to affect the results of analyzing whether the mean scores of the games conform to a normal distribution.
There will be some irregularly recorded data inside the dataset, for example, there will be games with negative time years and 0 inside yearpulished varieble, which may bring limitations to our analysis.
Our data comes from Kaggle by way of Board Games Geek, with a hattip to David and Georgios. We could find the data via the following website: https://github.com/rfordatascience/tidytuesday/tree/master/data/2022/2022-01-25
We have two initial datasets, ratings and details. The initial size of the ratings dataset is around 4.8MB and contains 21631 observations and 10 columns. The original size of the details dataset is approximately 32.7MB, with 21831 observations and 23 columns.
| class | description |
|---|---|
| double | Game ID |
| character | Game name |
| double | Average rating on Board Games Geek (1-10) |
| class | dsecription |
|---|---|
| double | Game ID |
| double | Year game was published |
| character | Game mechanic - how to play the game (separated by comma) |
| number | Minimum number of players required to play |
| number | Maximum number of players required to play |
| number | Average playing time of a game |
| number | People who own a game |
| number | People who trade a game |
| number | People who want to own a game |
| number | People who wish to own a game |
Q1. How board game ratings change by year of publication?
Q2. Trends in game mechanics over time.
Q3. Trends in board game publication rates over time.
Q4. What are the common game mechanics and their changes in prevalent?
Q5. What types of games are becoming increasingly popular?
Q6. How many games have each listed how many mechanics?
Q7. What is the distribution of ratings for the board games?
Q8. Is there a linear relationship between the year of release of a game and the average rating it receives?
Q9. Are Board Game Descriptions more positive or negative?
Q10. What are the 10 most common words used in board game descriptions?
Q11. Is there any relationship between average game time and ownership, min/max players?
Q12. How does number of people who wish to own the game and who own the game plot against each other?
Q13. How does Rank and average rating plot against each other. (Highly ranked games should have more rating)
Q14. Which age group prefers which type of game eg medical , building etc.? eg. younger children might be interested in building games etc and teenagers might be interested in games like Monopoly
Q15. Are people intrigued by a particular game designer?
Q16. How did the era of electronic games(starting from 2010-11) affect the number of people who want/wish/own board games?
Q17. Is it right to assume that a game board publisher publishes a single type of board game and do the game publishers only focus on a certain age group?
Q18. Are people intrigued by a particular game publisher?
Q19. How accurate has bayes average been?
Q20. Which era or decade played a big role in people playing more board games?
Q1. What are the common game mechanics and their changes in prevalent?
Q2. What is the distribution of ratings for the board games?
Q3. Is there any relationship between average playing time of a game and ownership(or products sold), min/max players required to play that game.
Q4. How did the era of electronic games(starting from 2010-11) affect the number of people who want/wish/own board games?
Q1. The most common game mechanics maybe the ‘Acting’, ‘Dice Rolling’ and ‘Hand Management’. I guess many game mechanics will become more popular.
Q2. I guess the ratings for the games follow a normal distribution.
Q3. More average play might mean less owners, as people might not want to invest too much time in a game, but it might be popular among bigger groups as board games keep a group engaged. But as the required number of players increase, the number of products sold should drop.
Q4. The demand of board games should be lower after the start of electronic games era.
| boardgamemechanic | n |
|---|---|
| Dice Rolling | 6112 |
| Hand Management | 4421 |
| Set Collection | 2936 |
| Variable Player Powers | 2719 |
| Hexagon Grid | 2371 |
| Simulation | 2099 |
| Card Drafting | 1869 |
| Tile Placement | 1805 |
| Modular Board | 1697 |
| Grid Movement | 1635 |
According to above table and column plot, we can find that the most common game mechanic is Dice Rolling, unsurprisingly. This is because Dice Rolling itself is a mechanic that can be used in many games. It has been around for a long time, and ancient peoples could make simple dice out of stones, clay, bones, etc. to play the game, so Dice Rolling is often seen as the most dominant symbol of board games (Sofiia & Joseph Alexander, 2017).
Before I analyzed this question, I made an inference based on the actual situation of family and friends in my life – the most common board game mechanics will become more and more popular. With lollipop plot above, we can see that the top 20 most common game mechanics have become more and more popular over the past few decades. This is consistent with my previous assumptions. This is because modern board games are starting to include more mechanics, and the variety of games is becoming richer over time, so board games as a whole can also appeal to a wider audience. Another very important reason is that we now have a better standard of living and more free time, and the increase in leisure time is an obvious driver of demand for entertainment products such as board games.
| statistic | p.value | method | alternative |
|---|---|---|---|
| 0.02198876 | 1.6e-09 | One-sample Kolmogorov-Smirnov test | two-sided |
From the above table output we can see that the test statistic is 0.021989, corresponding to a p-value of 1.647e-09. Since the p-value is less than 0.05, we reject the original hypothesis. We have sufficient evidence that the board game ratings for this sample data are not from a normal distribution.
Figure 5.1: Relationship between average game time and games owned
##
## Call:
## lm(formula = owned ~ playingtime, data = Q3_dataset)
##
## Coefficients:
## (Intercept) playingtime
## 1490.36912 -0.02701
Figure 5.2: Relationship between average game time and games owned faceted for diiferent minimum players required
| minplayers | sum |
|---|---|
| 0 | 16197 |
| 1 | 6801485 |
| 2 | 20498836 |
| 3 | 3295642 |
| 4 | 631582 |
| 5 | 174119 |
| 6 | 26102 |
| 7 | 4258 |
| 8 | 29110 |
| 10 | 130 |
According to the plot in figure 5.1 one can clearly observe that if the average playing time of board games increases, the number of people who own that game decreases. My reason to assume the same was, games which require a lot of time to finish, might be a less popular option for people as a result of lack of time.
To verify my result, I fitted a linear model for the variables and found out the slope was negative.
A game with more playing time might sell less products but it can be the popular with people who play games in big groups. I expected, as the number of people required to play a game increases, the game’s selling numbers should drop. See Figure 5.2, one can observe as the required players to play a game increases, the number of products sold decreased drastically (See also 5.3 ) as smaller groups of people can more often indulge in board games.
The top 20 most common game mechanics have become more and more popular over the past few decades. The most common game mechanic being Dice Rolling. This was achieved by doing some text analysis of the selected columns, taking the help of functions like seperate_row, gsub. Plotting histogram and performing KS-test statistic on ratings of board games, helped us came to a conclusion that the board games rating from this data set does not follow normal distribution. The average playing time of a board game and minimum number of players required to play a game is inversely proportional to the number of games sold. We took a look at the linear model coefficients to confirm our assumption and observation.
Data pivoting enabled us to rearrange the columns and rows in a report so we can view data from different perspectives. The era of mobile games affected people’s interest in board games negatively as after year 2012-13 the number of games owned/want/wish/trade dropped drastically.
C. Sievert. Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida, 2020.
H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
Harmon, J. (2022). Tidy Tuesday. Retrieved from https://www.tidytuesday.com/
Lüdecke et al., (2021). performance: An R Package for Assessment, Comparison and Testing of Statistical Models. Journal of Open Source Software, 6(60), 3139. https://doi.org/10.21105/joss.03139
Robinson D (2022). drlib: Personal R package of David Robinson. R package version 0.1.1.
Robinson D, Hayes A, Couch S (2022). broom: Convert Statistical Objects into Tidy Tibbles. R package version 1.0.0, https://CRAN.R-project.org/package=broom.
tidytuesday/data/2022/2022-01-25 at master · rfordatascience/tidytuesday. (2022). Retrieved from https://github.com/rfordatascience/tidytuesday/tree/master/data/2022/2022-01-25
Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686 https://doi.org/10.21105/joss.01686.
Wickham H, Girlich M (2022). tidyr: Tidy Messy Data. R package version 1.2.0, https://CRAN.R-project.org/package=tidyr.
Wickham H (2022). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.4.1, https://CRAN.R-project.org/package=stringr.
Yermolaieva S, Brown JA (2017). Dice design deserves discourse. Game & Puzzle Design, 3(2), 64-70.
Yihui Xie (2022). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.39.
Zhu H (2021). kableExtra: Construct Complex Table with ‘kable’ and Pipe Syntax. R package version 1.3.4, https://CRAN.R-project.org/package=kableExtra.